Chromosome-wide characterization of Y-STR mutation rates using ultra-deep genealogies
نویسندگان
چکیده
Although the utility of short tandem repeats on the Y-chromosome (Y-STRs) has long been recognized and leveraged in forensics, genealogy and paternity testing, the bulk of these applications have relied on only a few dozen loci identified as having remarkably high mutation rates. Recent efforts have expanded the set of Y-STRs with known mutation rates to two hundred markers, but the limited throughput of the capillary method for estimating mutation rates has left the mutability of most Y-STRs uncharacterized, particularly those with dinucleotide repeat units. To address this limitation, we developed a novel method capable of concurrently estimating the mutation rates of all Y-STRs by leveraging population-scale whole-genome sequencing data. Extensive simulations confirmed that our method robustly accounts for PCR stutter artifacts and obtains unbiased mutation rate estimates. Application of the method to orthogonal datasets from the 1000 Genomes Project and Simons Genome Diversity Project utilized evolutionary data from over 250,000 meioses to estimate the mutation rates of more than 700 Y-STRs with 2-6 base pair repeat units, yielding the largest such set to date. Comparison of these estimates with those from father-son studies indicated a high degree of concordance for loci that have been previously characterized. In addition, we identified nearly 100 previously uncharacterized Y-STRs with pergeneration mutation rates greater than 1 in 3000. Altogether, our study provides a broadly applicable method for estimating Y-STR mutation rates from whole-genome sequencing cohorts, outlines a framework for imputing Y-STRs, vastly expands the number of identified loci with high discriminative power and provides the first chromosome-wide characterization of the mutation rates of dinucleotide short tandem repeats. peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/036590 doi: bioRxiv preprint first posted online Jan. 15, 2016;
منابع مشابه
Population-Scale Sequencing Data Enable Precise Estimates of Y-STR Mutation Rates.
Short tandem repeats (STRs) are mutation-prone loci that span nearly 1% of the human genome. Previous studies have estimated the mutation rates of highly polymorphic STRs by using capillary electrophoresis and pedigree-based designs. Although this work has provided insights into the mutational dynamics of highly mutable STRs, the mutation rates of most others remain unknown. Here, we harnessed ...
متن کاملMaximum likelihood estimation of locus-specific mutation rates in Y-chromosome short tandem repeats
MOTIVATION Y-chromosome short tandem repeats (Y-STRs) are widely used for population studies, forensic purposes and, potentially, the study of disease, therefore knowledge of their mutation rate is valuable. Here we show a novel method for estimation of site-specific Y-STR mutation rates from partial phylogenetic information, via the maximum likelihood framework. RESULTS Given Y-STR data clas...
متن کاملMitochondrial and Y chromosome haplotype motifs as diagnostic markers of Jewish ancestry: a reconsideration
Several authors have proposed haplotype motifs based on site variants at the mitochondrial genome (mtDNA) and the non-recombining portion of the Y chromosome (NRY) to trace the genealogies of Jewish people. Here, we analyzed their main approaches and test the feasibility of adopting motifs as ancestry markers through construction of a large database of mtDNA and NRY haplotypes from public genet...
متن کاملTowards Improvements in the Estimation of the Coalescent: Implications for the Most Effective Use of Y Chromosome Short Tandem Repeat Mutation Rates
Over the past two decades, many short tandem repeat (STR) microsatellite loci on the human Y chromosome have been identified together with mutation rate estimates for the individual loci. These have been used to estimate the coalescent age, or the time to the most recent common ancestor (TMRCA) expressed in generations, in conjunction with the average square difference measure (ASD), an unbiase...
متن کاملY-chromosome short tandem repeat intermediate variant alleles DYS392.2, DYS449.2, and DYS385.2 delineate new phylogenetic substructure in human Y-chromosome haplogroup tree.
AIM To determine the human Y-chromosome haplogroup backgrounds of intermediate-sized variant alleles displayed by short tandem repeat (STR) loci DYS392, DYS449, and DYS385, and to evaluate the potential of each intermediate variant to elucidate new phylogenetic substructure within the human Y-chromosome haplogroup tree. METHODS Molecular characterization of lineages was achieved using a combi...
متن کامل